Computing semantic relatedness of words and texts in Wikipedia-derived semantic space

نویسندگان

Evgeniy Gabrilovich

Shaul Markovitch

چکیده

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was either based on purely statistical techniques that did not make use of background knowledge or on huge manual efforts, such as the CYC projects. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for finegrained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We use machine learning techniques that allow us to explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on automatically computing the degree of semantic relatedness between fragments of natural language text. Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0.56 to 0.75 for individual words and from r = 0.60 to 0.72 for texts. Consequently, we anticipate ESA to give rise to the next generation of natural language processing tools. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis

Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weight...

متن کامل

Wikipedia-Based Semantic Interpreter Using Approximate Top-k Processing and Its Application

Proper representation of the meaning of texts is crucial for enhancing many data mining and information retrieval tasks, including clustering, computing semantic relatedness between texts, and searching. Representing of texts in the concept-space derived from Wikipedia has received growing attention recently. This concept-based representation is capable of extracting semantic relatedness betwee...

متن کامل

Wikipedia-based Compact Hierarchical Semantics with Application to Semantic Relatedness

A proper semantic representation of words and texts underlies many text processing tasks. In this paper, we present a novel representation of semantics which is based on an hierarchical ontology of natural concepts derived from Wikipedia articles and category system. Our method, called Compact Hierarchical Explicit Semantic Analysis (CHESA) generates compact hierarchical representations of unre...

متن کامل

WikiWalk: Random walks on Wikipedia for Semantic Relatedness

Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge from a broad range of real-world concepts and relationships. We address this knowledge integration issue with a method of computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. T...

متن کامل

Computing Semantic Similarity of Documents Based on Semantic Tensors

Exploiting semantic content of texts due to its wide range of applications such as finding related documents to a query, document classification and computing semantic similarity of documents has always been an important and challenging issue in Natural Language Processing. In this paper, using Wikipedia corpus and organizing it by three-dimensional tensor structure, a novel corpus-based approa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Computing semantic relatedness of words and texts in Wikipedia-derived semantic space

نویسندگان

چکیده

منابع مشابه

Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis

Wikipedia-Based Semantic Interpreter Using Approximate Top-k Processing and Its Application

Wikipedia-based Compact Hierarchical Semantics with Application to Semantic Relatedness

WikiWalk: Random walks on Wikipedia for Semantic Relatedness

Computing Semantic Similarity of Documents Based on Semantic Tensors

عنوان ژورنال:

اشتراک گذاری